Chapter X ENCODING SYNTACTIC ANNOTATION

نویسنده

  • Laurent Romary
چکیده

There is a widely recognized need for a general framework for linguistic annotation that is flexible and extensible enough to accommodate different annotation types and different theoretical and practical approaches, while at the same time enabling their representation in a “pivot” format that can serve as the basis for comparative evaluation, merging, and the development of reusable editing and processing tools. To answer this need, we have developed a framework comprised of an abstract model for a variety of different annotation types (e.g., morpho-syntactic tagging, syntactic annotation, coreference annotation, etc.), which can be instantiated in different ways depending on the annotator’s approach and goals. The results have been incorporated into XCES (Ide, et al., 2000a), the XML instantiation of the Corpus Encoding Standard (Ide, 1998a,b), which provides a ready-made, standard encoding format together with a data architecture designed specifically for linguistically annotated corpora.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Chapter 16 ENCODING SYNTACTIC ANNOTATION

There is a widely recognized need for a general framework for linguistic annotation that is flexible and extensible enough to accommodate different annotation types and different theoretical and practical approaches, while at the same time enabling their representation in a “pivot” format that can serve as the basis for comparative evaluation, merging, and the development of reusable editing an...

متن کامل

An annotation scheme for Persian based on Autonomous Phrases Theory and Universal Dependencies

A treebank is a corpus with linguistic annotations above the level of the parts of speech. During the first half of the present decade, three treebanks have been developed for Persian either originally or subsequently based on dependency grammar: Persian Treebank (PerTreeBank), Persian Syntactic Dependency Treebank, and Uppsala Persian Dependency Treebank (UPDT). The syntactic analysis of a sen...

متن کامل

Encoding Syntactic Annotation

There is a need for a general framework for linguistic annotation that is flexible and extensible enough to accommodate different annotation types and different theoretical and practical approaches, while at the same time enabling their representation in a “pivot” format that can serve as the basis for comparative evaluation, merging, and the development of reusable editing and processing tools...

متن کامل

A Powerful and Versatile XML Format for Representing Role-semantic Annotation

We present two XML formats for the description and encoding of semantic role information in corpora. The TIGER/SALSA XML format provides a modular representation for semantic roles and syntactic structure. The Text-SALSA XML format is a lightweight version of TIGER/SALSA XML designed for manual annotation with an XML editor rather than a special tool. Both formats can deal with underspecificati...

متن کامل

Syntactic Wordclass Tagging

Part-of-speech (POS) tagging is one of the most popular and thoroughly researched tasks in the field of natural language processing, particularly since it is a prerequisite for a wide variety of more complex tasks. The book Syntactic Wordclass Tagging is a multiauthor collection of articles giving advice on how to use and implement a POS tagger. Part I of the book is entitled "The User's View" ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001